intrinsic lr
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Mismatches between Traditional Optimization Analyses and Modern Deep Learning
You may remember our previous blog post showing that it is possible to do state-of-the-art deep learning with learning rate that increases exponentially during training. It was meant to be a dramatic illustration that what we learned in optimization classes and books isn't always a good fit for modern deep learning, specifically, normalized nets, which is our term for nets that use any one of popular normalization schemes,e.g. Today's post (based upon our paper with Kaifeng Lyu at NeurIPS20) identifies other surprising incompatibilities between normalized nets and traditional analyses. We hope this will change the way you teach and think about deep learning! Before diving into the results, we recall that normalized nets are typically trained with weight decay (aka $\ell_2$ regularization).